A flexible shrinkage operator for fussy grouped variable selection

نویسنده

  • Xiaoli Gao
چکیده

Existing grouped variable selection methods rely heavily on prior group information, thus they may not be reliable if an incorrect group assignment is used. In this paper, we propose a family of shrinkage variable selection operators by controlling the k-th largest norm (KAN). The proposed KAN method exhibits some flexible group-wise variable selection naturally even though no correct prior group information is available. We also construct a group KAN shrinkage operator using a composite of KAN constraints. Neither ignoring nor relying completely on prior group information, the group KAN method has the flexibility of controlling within group strength and therefore can reduce the effect caused by incorrect group information. Finally, we investigate an unbiased estimator of the degrees of freedom for (group) KAN estimates in the framework of Stein’s unbiased risk estimation. Extensive simulation studies and real data analysis are performed to demonstrate the advantage of KAN and group KAN over the LASSO and group LASSO, respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Differenced-Based Double Shrinking in Partial Linear Models

Partial linear model is very flexible when the relation between the covariates and responses, either parametric and nonparametric. However, estimation of the regression coefficients is challenging since one must also estimate the nonparametric component simultaneously. As a remedy, the differencing approach, to eliminate the nonparametric component and estimate the regression coefficients, can ...

متن کامل

Joint Variable Selection and Classification with Immunohistochemical Data

To determine if candidate cancer biomarkers have utility in a clinical setting, validation using immunohistochemical methods is typically done. Most analyses of such data have not incorporated the multivariate nature of the staining profiles. In this article, we consider modelling such data using recently developed ideas from the machine learning community. In particular, we consider the joint ...

متن کامل

Shrinkage Estimation of Semiparametric Model with Missing Responses for Cluster Data

This paper simultaneously investigates variable selection and imputation estimation of semiparametric partially linear varying-coefficient model in that case where there exist missing responses for cluster data. As is well known, commonly used approach to deal with missing data is complete-case data. Combined the idea of complete-case data with a discussion of shrinkage estimation is made on di...

متن کامل

Variable selection for multiply-imputed data with application to dioxin exposure study.

Multiple imputation (MI) is a commonly used technique for handling missing data in large-scale medical and public health studies. However, variable selection on multiply-imputed data remains an important and longstanding statistical problem. If a variable selection method is applied to each imputed dataset separately, it may select different variables for different imputed datasets, which makes...

متن کامل

Variable Selection in Nonparametric and Semiparametric Regression Models

This chapter reviews the literature on variable selection in nonparametric and semiparametric regression models via shrinkage. We highlight recent developments on simultaneous variable selection and estimation through the methods of least absolute shrinkage and selection operator (Lasso), smoothly clipped absolute deviation (SCAD) or their variants, but restrict our attention to nonparametric a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017